Onset Detection Exploiting Adaptive Linear Prediction Filtering in Dwt Domain with Bidirectional Long Short-term Memory Neural Networks

نویسندگان

  • G. Ferroni
  • E. Marchi
  • F. Eyben
  • L. Gabrielli
  • S. Squartini
  • B. Schuller
چکیده

The following short paper presents an experimental algorithm for onset detection which apply features extraction in the wavelet domain and auditory spectral features to Bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks for decision-making. The presented algorithm exploits multi-resolution time-frequency features via the discrete wavelet transformation to decompose the input audio signal into sub-bands. Each sub-band is processed by a linear prediction error filters by obtaining the prediction error. The latter together with the wavelet coefficients, their temporal differences and the well-known auditory spectral features are used as input units for the supervised learning. The algorithm has been tested against the MIREX 2013 onset dataset. 1. ALGORITHM DESCRIPTION The main challenge of this task lies in the audio input representation which should give optimal (in some sense) features for the onset recognition. Our approach is based on linear prediction filtering in the wavelet domain as in [3] for the feature extraction. The main difference with the cited approach lies in the application of a bidirectional recurrent neural network with Long Short-Term Memory units (LSTM [6]) to obtain an Onset Detection Function (ODF). Audio signals are generally composed by stationary or quasi-stationary parts and by transients which, conversely, violates the stationary condition playing an important role in the perception of music for humans and consequently in the onsets detection. Indeed a signal modelled by a linear prediction filter gives a prediction error signal tending to zero during the stationary parts but, at the note boundary, the prediction error envelope increases. Consequently, the onset can be located by analysing the prediction error signal. Wavelet analysis is applied to obtain a subbands This document is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 License. http://creativecommons.org/licenses/by-nc-sa/3.0/ c © 2013 The Authors. Features Extraction x[n] BLSTM RNN Thresholding Peak-Picking Onset FN x M ODF Figure 1. General algorithm block-scheme. x[n] represents the discrete input audio file, FNxM indicates the features matrix and ODF is the onset detection function. signal representation and for the fast convergence speed of adaptive prediction filters approach in the transformed domain [9]. In order to obtain the best audio input representation, the input signal x[n] is firstly decomposed in different subbands using a dyadic filter bank based on wavelet filter coefficients. Each band is, thus, modelled by a Linear Prediction Error Filter (LPEF) and its coefficients are updated by a modified version of a well-know adaptive technique: Normalized LMS (NMLS). We preferred an adaptive approach instead of optimal solution search because the filter’s coefficients are continuously updated so that non-stationary parts (i.e., note boundary) produce a significant increment of prediction error envelope. Due to different lengths of the wavelet coefficients (i.e., filter bank output signals) and prediction errors (i.e., LPEF output signals) and in order to use them as neural network inputs, they are re-sampled at a predetermined rate and normalized. Furthermore their first order positive differences are computed. Finally, in order to obtain better performance, a subset of auditory spectral features [2] are added to preceding set leading to the features matrix FNxM where M is the number of features and N is the ”frame” index. This matrix is, thus, used as input of a bidirectional recurrent neural network with Long Short-Term Memory units (BLSTM). Network acts as a reduction operator leading to the ODF. Finally a thresholding and peak-picking algorithm is applied to ODF in order to identify the correct onset positions. Algorithm block-scheme is showed in Figure 1 and block details are described in the following sections. 1.1 Feature Extraction Discrete input audio files, mono sampled at Fs = 44.1kHz, have been used for our experiments. 1.1.1 Discrete Wavelet Transformation The input file is decomposed in subbands applying a multiresolution analysis computed by a dyadic filter bank (cf. Figure 2) as in [3].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of Covid-19 Prevalence and Fatality Rates in Iran Using Long Short-Term Memory Neural Network

Introduction: The rapid spread of COVID-19 has become a critical threat to the world. So far, millions of people worldwide have been infected with the disease. The Covid-19 pandemic has had significant effects on various aspects of human life. Currently, prediction of the virus's spread is essential in order to be safe and make necessary arrangements. It can help control the rate of its outbrea...

متن کامل

Prediction of Covid-19 Prevalence and Fatality Rates in Iran Using Long Short-Term Memory Neural Network

Introduction: The rapid spread of COVID-19 has become a critical threat to the world. So far, millions of people worldwide have been infected with the disease. The Covid-19 pandemic has had significant effects on various aspects of human life. Currently, prediction of the virus's spread is essential in order to be safe and make necessary arrangements. It can help control the rate of its outbrea...

متن کامل

Onset Detection for Piano Music Transcription Based on Neural Networks

Onset detection refers to the task of determining the physical starting time of notes or other musical events as they occur in a music recording. Various kinds of onset detection methods have been proposed in recent years. The goal of this paper is to choose a relative appropriate method to do onset detection. The neural network is discussed, especially the advanced bidirectional long short-ter...

متن کامل

Universal Onset Detection with Bidirectional Long Short-Term Memory Neural Networks

Many different onset detection methods have been proposed in recent years. However those that perform well tend to be highly specialised for certain types of music, while those that are more widely applicable give only moderate performance. In this paper we present a new onset detector with superior performance and temporal precision for all kinds of music, including complex music mixes. It is ...

متن کامل

Protein Secondary Structure Prediction with Long Short Term Memory Networks

Prediction of protein secondary structure from the amino acid sequence is a classical bioinformatics problem. Common methods use feed forward neural networks or SVM’s combined with a sliding window, as these models does not naturally handle sequential data. Recurrent neural networks are an generalization of the feed forward neural network that naturally handle sequential data. We use a bidirect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013